Determine if a unit is providing marginal power

Change in net ERCOT load
Import and concat EPA hourly data
Group EPA hourly gen change by cluster
To-do for finishing model inputs

A couple plots of individual power plants



In [1]:

    
%matplotlib inline
from __future__ import division
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import gzip
import cPickle as pickle









    



/Users/Home/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

Method

For every hour, find the change in net load from the previous hour. Net load is defined as total load minus wind generation.

Change in net load

Use ERCOT data on hourly load and hourly wind generation to create Net Load and Net Change columns.



In [2]:

    
filename = 'ERCOT wind data.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
ercot = pd.read_csv(fullpath, index_col=0)



In [3]:

    
ercot.head()









    Out[3]:






  
    
      
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
    
  
  
    
      2007-01-01 00:00:00
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
    
    
      2007-01-01 01:00:00
      30133.0
      2790.0
      922.6
      33.068100
      3.061760
      -151.4
      -14.096834
    
    
      2007-01-01 02:00:00
      29941.0
      2790.0
      849.2
      30.437276
      2.836245
      -73.4
      -7.955777
    
    
      2007-01-01 03:00:00
      29949.0
      2790.0
      1056.3
      37.860215
      3.526996
      207.1
      24.387659
    
    
      2007-01-01 04:00:00
      30248.0
      2790.0
      837.1
      30.003584
      2.767456
      -219.2
      -20.751680



In [4]:

    
ercot.loc[:,'Net Load (MW)'] = ercot.loc[:,'ERCOT Load, MW'] - ercot.loc[:,'Total Wind Output, MW']



In [5]:

    
ercot.head()









    Out[5]:






  
    
      
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
      Net Load (MW)
    
  
  
    
      2007-01-01 00:00:00
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
    
    
      2007-01-01 01:00:00
      30133.0
      2790.0
      922.6
      33.068100
      3.061760
      -151.4
      -14.096834
      29210.4
    
    
      2007-01-01 02:00:00
      29941.0
      2790.0
      849.2
      30.437276
      2.836245
      -73.4
      -7.955777
      29091.8
    
    
      2007-01-01 03:00:00
      29949.0
      2790.0
      1056.3
      37.860215
      3.526996
      207.1
      24.387659
      28892.7
    
    
      2007-01-01 04:00:00
      30248.0
      2790.0
      837.1
      30.003584
      2.767456
      -219.2
      -20.751680
      29410.9



In [6]:

    
ercot.loc[1:,'Net Load Change (MW)'] = ercot.iloc[1:,-1].values - ercot.iloc[:-1,-1].values



In [7]:

    
ercot.loc[:,'DATETIME'] = pd.to_datetime(ercot.index)



In [8]:

    
ercot.head()









    Out[8]:






  
    
      
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
      Net Load (MW)
      Net Load Change (MW)
      DATETIME
    
  
  
    
      2007-01-01 00:00:00
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
      2007-01-01 00:00:00
    
    
      2007-01-01 01:00:00
      30133.0
      2790.0
      922.6
      33.068100
      3.061760
      -151.4
      -14.096834
      29210.4
      -143.6
      2007-01-01 01:00:00
    
    
      2007-01-01 02:00:00
      29941.0
      2790.0
      849.2
      30.437276
      2.836245
      -73.4
      -7.955777
      29091.8
      -118.6
      2007-01-01 02:00:00
    
    
      2007-01-01 03:00:00
      29949.0
      2790.0
      1056.3
      37.860215
      3.526996
      207.1
      24.387659
      28892.7
      -199.1
      2007-01-01 03:00:00
    
    
      2007-01-01 04:00:00
      30248.0
      2790.0
      837.1
      30.003584
      2.767456
      -219.2
      -20.751680
      29410.9
      518.2
      2007-01-01 04:00:00

EPA data

Calculate the change in generation for each plant in every hour. By summing the change across all plants in a group (cluster), we will have the change in generation for that group from the previous hour to the current hour.



In [9]:

    
# load a zipped pickle file 
# from http://stackoverflow.com/questions/18474791/decreasing-the-size-of-cpickle-objects
def load_zipped_pickle(filename):
    with gzip.open(filename, 'rb') as f:
        loaded_object = pickle.load(f)
        return loaded_object



In [10]:

    
filename = 'EPA hourly dictionary.pgz'
path = '../Clean Data'
fullpath = os.path.join(path, filename)

epaDict = load_zipped_pickle(fullpath)



In [11]:

    
epaDict.keys()









    Out[11]:





['2012 Jan-June',
 '2007 July-Dec',
 '2013 Jan-June',
 '2008 July-Dec',
 '2011 Jan-June',
 '2010 Jan-June',
 '2010 July-Dec',
 '2009 July-Dec',
 '2015 Jan-June',
 '2015 July-Dec',
 '2012 July-Dec',
 '2007 Jan-June',
 '2008 Jan-June',
 '2011 July-Dec',
 '2014 Jan-June',
 '2014 July-Dec',
 '2009 Jan-June',
 '2013 July-Dec']



In [12]:

    
def plant_gen_delta(df):
    """
    For every plant in the input df, calculate the change in gross load (MW)
    from the previous hour.
    
    input:
        df: dataframe of EPA clean air markets data
    return:
        df: concatanated list of dataframes
    """
    df_list = []
    for plant in df['PLANT_ID'].unique():
        temp = df.loc[df['PLANT_ID'] == plant,:]
        gen_change = temp.loc[:,'GROSS LOAD (MW)'].values - temp.loc[:,'GROSS LOAD (MW)'].shift(1).values
        temp.loc[:,'Gen Change (MW)'] = gen_change
        df_list.append(temp)
    return pd.concat(df_list)



In [13]:

    
allEPA = pd.concat(epaDict)



In [14]:

    
allEPA.fillna(0, inplace=True)
allEPA.head()









    Out[14]:






  
    
      
      
      CO2 (SHORT TONS)
      COUNTY
      DATE
      DATETIME
      EPA REGION
      FACILITY LATITUDE
      FACILITY LONGITUDE
      FACILITY NAME
      GROSS LOAD (MW)
      HEAT INPUT (MMBTU)
      HOUR
      NERC REGION
      NOX (POUNDS)
      PLANT_ID
      SO2 (POUNDS)
      STATE
      STEAM LOAD (1000LB/HR)
      YEAR
    
  
  
    
      2007 Jan-June
      0
      0.0
      Harris
      2007-01-01
      2007-01-01 00:00:00
      6
      29.7239
      -95.2258
      AES Western Power, LLC
      0.0
      0.0
      0
      ERCOT
      0.0
      3461
      0.0
      TX
      0.0
      2007
    
    
      1
      0.0
      Harris
      2007-01-01
      2007-01-01 01:00:00
      6
      29.7239
      -95.2258
      AES Western Power, LLC
      0.0
      0.0
      1
      ERCOT
      0.0
      3461
      0.0
      TX
      0.0
      2007
    
    
      2
      0.0
      Harris
      2007-01-01
      2007-01-01 02:00:00
      6
      29.7239
      -95.2258
      AES Western Power, LLC
      0.0
      0.0
      2
      ERCOT
      0.0
      3461
      0.0
      TX
      0.0
      2007
    
    
      3
      0.0
      Harris
      2007-01-01
      2007-01-01 03:00:00
      6
      29.7239
      -95.2258
      AES Western Power, LLC
      0.0
      0.0
      3
      ERCOT
      0.0
      3461
      0.0
      TX
      0.0
      2007
    
    
      4
      0.0
      Harris
      2007-01-01
      2007-01-01 04:00:00
      6
      29.7239
      -95.2258
      AES Western Power, LLC
      0.0
      0.0
      4
      ERCOT
      0.0
      3461
      0.0
      TX
      0.0
      2007



In [15]:

    
allEPA = plant_gen_delta(allEPA)









    



/Users/Home/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:296: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
/Users/Home/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py:476: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s



In [16]:

    
allEPA.reset_index(drop=True, inplace=True)



In [17]:

    
allEPA.tail()









    Out[17]:






  
    
      
      CO2 (SHORT TONS)
      COUNTY
      DATE
      DATETIME
      EPA REGION
      FACILITY LATITUDE
      FACILITY LONGITUDE
      FACILITY NAME
      GROSS LOAD (MW)
      HEAT INPUT (MMBTU)
      HOUR
      NERC REGION
      NOX (POUNDS)
      PLANT_ID
      SO2 (POUNDS)
      STATE
      STEAM LOAD (1000LB/HR)
      YEAR
      Gen Change (MW)
    
  
  
    
      4511875
      395.801
      Brazoria
      2015-12-31
      2015-12-31 19:00:00
      6
      29.0728
      -95.745
      Sweeny Cogeneration Facility
      422.0
      6660.4
      19
      ERCOT
      174.4
      55015
      3.358
      TX
      0.0
      2015
      -1.0
    
    
      4511876
      397.101
      Brazoria
      2015-12-31
      2015-12-31 20:00:00
      6
      29.0728
      -95.745
      Sweeny Cogeneration Facility
      423.0
      6681.5
      20
      ERCOT
      153.7
      55015
      3.378
      TX
      0.0
      2015
      1.0
    
    
      4511877
      395.601
      Brazoria
      2015-12-31
      2015-12-31 21:00:00
      6
      29.0728
      -95.745
      Sweeny Cogeneration Facility
      420.0
      6655.3
      21
      ERCOT
      162.9
      55015
      3.364
      TX
      0.0
      2015
      -3.0
    
    
      4511878
      401.501
      Brazoria
      2015-12-31
      2015-12-31 22:00:00
      6
      29.0728
      -95.745
      Sweeny Cogeneration Facility
      420.0
      6756.3
      22
      ERCOT
      175.4
      55015
      3.383
      TX
      0.0
      2015
      0.0
    
    
      4511879
      403.300
      Brazoria
      2015-12-31
      2015-12-31 23:00:00
      6
      29.0728
      -95.745
      Sweeny Cogeneration Facility
      422.0
      6787.3
      23
      ERCOT
      156.4
      55015
      3.402
      TX
      0.0
      2015
      2.0

Generation change by group/cluster

Clusters are determined using k-means.



In [18]:

    
cluster_df = pd.read_csv('Cluster labels.csv')



In [19]:

    
merged_epa_cluster = pd.merge(allEPA, cluster_df, left_on=['PLANT_ID', 'YEAR'], 
                             right_on=['plant_id', 'year'])



In [20]:

    
merged_epa_cluster.head()









    Out[20]:






  
    
      
      CO2 (SHORT TONS)
      COUNTY
      DATE
      DATETIME
      EPA REGION
      FACILITY LATITUDE
      FACILITY LONGITUDE
      FACILITY NAME
      GROSS LOAD (MW)
      HEAT INPUT (MMBTU)
      ...
      PLANT_ID
      SO2 (POUNDS)
      STATE
      STEAM LOAD (1000LB/HR)
      YEAR
      Gen Change (MW)
      Unnamed: 0
      year
      plant_id
      cluster_id_6
    
  
  
    
      0
      0.0
      Nueces
      2007-01-01
      2007-01-01 00:00:00
      6
      27.6067
      -97.3119
      Barney M. Davis
      0.0
      0.0
      ...
      4939
      0.0
      TX
      0.0
      2007
      NaN
      57
      2007
      4939
      4
    
    
      1
      0.0
      Nueces
      2007-01-01
      2007-01-01 01:00:00
      6
      27.6067
      -97.3119
      Barney M. Davis
      0.0
      0.0
      ...
      4939
      0.0
      TX
      0.0
      2007
      0.0
      57
      2007
      4939
      4
    
    
      2
      0.0
      Nueces
      2007-01-01
      2007-01-01 02:00:00
      6
      27.6067
      -97.3119
      Barney M. Davis
      0.0
      0.0
      ...
      4939
      0.0
      TX
      0.0
      2007
      0.0
      57
      2007
      4939
      4
    
    
      3
      0.0
      Nueces
      2007-01-01
      2007-01-01 03:00:00
      6
      27.6067
      -97.3119
      Barney M. Davis
      0.0
      0.0
      ...
      4939
      0.0
      TX
      0.0
      2007
      0.0
      57
      2007
      4939
      4
    
    
      4
      0.0
      Nueces
      2007-01-01
      2007-01-01 04:00:00
      6
      27.6067
      -97.3119
      Barney M. Davis
      0.0
      0.0
      ...
      4939
      0.0
      TX
      0.0
      2007
      0.0
      57
      2007
      4939
      4
    
  

5 rows × 23 columns



In [21]:

    
grouped_clusters = merged_epa_cluster.loc[:,['Gen Change (MW)', 'GROSS LOAD (MW)', 'DATETIME', 'cluster_id_6']].groupby(['DATETIME', 'cluster_id_6']).sum()



In [22]:

    
grouped_clusters.reset_index(inplace=True)



In [23]:

    
grouped_clusters.dtypes









    Out[23]:





DATETIME           datetime64[ns]
cluster_id_6                int64
Gen Change (MW)           float64
GROSS LOAD (MW)           float64
dtype: object



In [24]:

    
grouped_clusters_ercot = pd.merge(grouped_clusters, ercot, on='DATETIME')



In [25]:

    
grouped_clusters_ercot.loc[:,'year'] = grouped_clusters_ercot.loc[:,'DATETIME'].apply(lambda x: x.year)
grouped_clusters_ercot.head()









    Out[25]:






  
    
      
      DATETIME
      cluster_id_6
      Gen Change (MW)
      GROSS LOAD (MW)
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
      Net Load (MW)
      Net Load Change (MW)
      year
    
  
  
    
      0
      2007-01-01
      0
      NaN
      4596.0
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
      2007
    
    
      1
      2007-01-01
      1
      NaN
      4614.0
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
      2007
    
    
      2
      2007-01-01
      2
      NaN
      124.0
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
      2007
    
    
      3
      2007-01-01
      3
      NaN
      126.0
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
      2007
    
    
      4
      2007-01-01
      4
      NaN
      537.0
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
      2007



In [100]:

    
g = sns.FacetGrid(grouped_clusters_ercot, col='year', col_wrap=3, hue='cluster_id_6')
g.map(plt.scatter, 'Net Load Change (MW)', 'Gen Change (MW)', facecolors='None')
g.set_xticklabels(rotation=30)
g.add_legend()









    Out[100]:





<seaborn.axisgrid.FacetGrid at 0x15060b350>






    Out[100]:





<seaborn.axisgrid.FacetGrid at 0x15060b350>






    Out[100]:





<seaborn.axisgrid.FacetGrid at 0x15060b350>

Generally the figure below shows that each cluster tends to behave somewhat linearly (change in generation over the hour vs change in net load), but there are a couple exceptions.

Cluster 4 looks like it might have 2 distinct groups in it. See if this is still true after we go back and check all calculations.
Same for Cluster 3, but not as strong of an effect.



In [109]:

    
sns.lmplot('Net Load Change (MW)', 'Gen Change (MW)', data=filtered_data,
           col='year', row='cluster_id_6', hue='cluster_id_6', robust=True, ci=None)









    Out[109]:





<seaborn.axisgrid.FacetGrid at 0x1b9594350>

The figure below shows how/if ramping changes as wind generation increases from 2007 to 2015. Cluster 3 is especially interesting, because it contains high-efficiency NG plants with average capacity just under 1 GW. Over time it starts ramping up (and down) by larger amounts . Some other clusters show similar behavior, while other (like 0, 1, 2) don't change nearly as much.



In [102]:

    
filtered_data = grouped_clusters_ercot.loc[grouped_clusters_ercot['year'].isin([2007, 2011, 2015])]

g = sns.FacetGrid(filtered_data, col='year', row='cluster_id_6')
g.map(plt.scatter, 'Total Wind Output, MW', 'Gen Change (MW)', facecolors='None')
g.set_xticklabels(rotation=30)
g.add_legend()









    Out[102]:





<seaborn.axisgrid.FacetGrid at 0x156082b50>






    Out[102]:





<seaborn.axisgrid.FacetGrid at 0x156082b50>






    Out[102]:





<seaborn.axisgrid.FacetGrid at 0x156082b50>

Getting Total Capacity per Cluster



In [26]:

    
#Load EIA 860 from picle
eia860fname = "EIA 860.pkl"
eia860Full = os.path.join(path, eia860fname)
eia860Dict = pickle.load(open(eia860Full, "rb"))

#Add year to as a column
for k in eia860Dict.keys():
    eia860Dict[k]["Year"] = k

#Flatten dictionary, rename columns, and do inner join 
merged860 = pd.concat(eia860Dict)
merged860.columns = ["plant_id", "nameplate_capacity", "year"]
merged860 = cluster_df.merge(merged860, on=["plant_id", "year"])

groupCapacity = merged860.loc[:,["cluster_id_6", "year", "nameplate_capacity"]].groupby(by=["cluster_id_6", "year"]).sum()



In [27]:

    
groupCapacity.head()









    Out[27]:






  
    
      
      
      nameplate_capacity
    
    
      cluster_id_6
      year
      
    
  
  
    
      0
      2007
      5949.0
    
    
      2008
      3969.0
    
    
      2009
      4008.4
    
    
      2010
      5988.4
    
    
      2011
      4008.4



In [29]:

    
groupCapacity.reset_index(inplace=True)



In [30]:

    
groupCapacity.head()









    Out[30]:






  
    
      
      cluster_id_6
      year
      nameplate_capacity
    
  
  
    
      0
      0
      2007
      5949.0
    
    
      1
      0
      2008
      3969.0
    
    
      2
      0
      2009
      4008.4
    
    
      3
      0
      2010
      5988.4
    
    
      4
      0
      2011
      4008.4

Merge with `grouped_clusters_ercot`



In [31]:

    
# Yes, I realize this is a horrible name
grouped_clusters_ercot_capacity = pd.merge(groupCapacity, grouped_clusters_ercot, on=['cluster_id_6', 'year'])



In [32]:

    
grouped_clusters_ercot_capacity.head()









    Out[32]:






  
    
      
      cluster_id_6
      year
      nameplate_capacity
      DATETIME
      Gen Change (MW)
      GROSS LOAD (MW)
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
      Net Load (MW)
      Net Load Change (MW)
    
  
  
    
      0
      0
      2007
      5949.0
      2007-01-01 00:00:00
      NaN
      4596.0
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
    
    
      1
      0
      2007
      5949.0
      2007-01-01 01:00:00
      -30.0
      4566.0
      30133.0
      2790.0
      922.6
      33.068100
      3.061760
      -151.4
      -14.096834
      29210.4
      -143.6
    
    
      2
      0
      2007
      5949.0
      2007-01-01 02:00:00
      101.0
      4667.0
      29941.0
      2790.0
      849.2
      30.437276
      2.836245
      -73.4
      -7.955777
      29091.8
      -118.6
    
    
      3
      0
      2007
      5949.0
      2007-01-01 03:00:00
      1.0
      4668.0
      29949.0
      2790.0
      1056.3
      37.860215
      3.526996
      207.1
      24.387659
      28892.7
      -199.1
    
    
      4
      0
      2007
      5949.0
      2007-01-01 04:00:00
      17.0
      4685.0
      30248.0
      2790.0
      837.1
      30.003584
      2.767456
      -219.2
      -20.751680
      29410.9
      518.2



In [33]:

    
grouped_clusters_ercot_capacity.describe()









    Out[33]:






  
    
      
      cluster_id_6
      year
      nameplate_capacity
      Gen Change (MW)
      GROSS LOAD (MW)
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
      Net Load (MW)
      Net Load Change (MW)
    
  
  
    
      count
      473334.000000
      473334.000000
      473334.000000
      473328.000000
      473334.000000
      473334.000000
      473334.000000
      473334.000000
      473334.000000
      473334.000000
      473280.000000
      420720.000000
      473334.000000
      473328.000000
    
    
      mean
      2.500000
      2010.999442
      7196.330644
      -0.012402
      2636.120820
      37013.204232
      9312.042054
      2945.984653
      30.975796
      8.510844
      0.036198
      3.771452
      34067.219580
      0.018802
    
    
      std
      1.707827
      2.581818
      4058.496386
      240.003813
      2370.098557
      9016.268746
      3096.526857
      2244.330501
      19.905077
      6.951802
      442.976558
      290.847217
      9478.982475
      1914.872580
    
    
      min
      0.000000
      2007.000000
      2602.100000
      -2877.000000
      0.000000
      19645.310547
      2790.000000
      -8.318962
      -0.162575
      -0.027421
      -3243.023247
      -35200.000000
      14995.058576
      -32967.731770
    
    
      25%
      1.000000
      2009.000000
      4008.400000
      -57.000000
      564.250000
      30804.125000
      8181.000000
      1156.001275
      13.862967
      3.034220
      -210.400451
      -9.952249
      27500.339356
      -1218.689494
    
    
      50%
      2.500000
      2011.000000
      5256.500000
      0.000000
      2158.000000
      34826.570312
      9430.400000
      2469.099226
      28.940298
      6.757363
      -5.400000
      -0.361188
      32092.907262
      -39.400000
    
    
      75%
      4.000000
      2013.000000
      10014.800000
      58.000000
      4052.000000
      41746.316406
      11064.400000
      4276.593214
      46.008671
      12.334329
      198.052032
      10.844909
      39093.334216
      1210.354252
    
    
      max
      5.000000
      2015.000000
      16228.900000
      3243.000000
      11607.000000
      69820.960940
      16170.000000
      13812.025390
      92.010060
      44.303584
      4044.954171
      56900.000000
      68484.264158
      32284.047237

Add fuel price data



In [34]:

    
filename = 'Fuel prices.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
fuel_prices = pd.read_csv(fullpath, index_col=0)



In [35]:

    
fuel_prices.head()









    Out[35]:






  
    
      
      NG Price ($/mcf)
      Month
      Year
      All coal
      Lignite
      Subbituminous
    
  
  
    
      0
      6.42
      1
      2007
      25.1475
      20.0275
      28.115
    
    
      1
      7.34
      2
      2007
      25.1475
      20.0275
      28.115
    
    
      2
      6.90
      3
      2007
      25.1475
      20.0275
      28.115
    
    
      3
      7.29
      4
      2007
      25.1475
      20.0275
      28.115
    
    
      4
      7.51
      5
      2007
      25.1475
      20.0275
      28.115



In [36]:

    
grouped_clusters_ercot_capacity.loc[:,'Month'] = grouped_clusters_ercot_capacity.loc[:,'DATETIME'].apply(lambda x: x.month)

# Make 'year' capitalization the same
grouped_clusters_ercot_capacity = grouped_clusters_ercot_capacity.rename(columns = {'year':'Year'})



In [51]:

    
X = pd.merge(grouped_clusters_ercot_capacity, fuel_prices, on=['Year', 'Month'])



In [52]:

    
X.head()









    Out[52]:






  
    
      
      cluster_id_6
      Year
      nameplate_capacity
      DATETIME
      Gen Change (MW)
      GROSS LOAD (MW)
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
      Net Load (MW)
      Net Load Change (MW)
      Month
      NG Price ($/mcf)
      All coal
      Lignite
      Subbituminous
    
  
  
    
      0
      0
      2007
      5949.0
      2007-01-01 00:00:00
      NaN
      4596.0
      30428.0
      2790.0
      1074.0
      38.494624
      3.529644
      NaN
      NaN
      29354.0
      NaN
      1
      6.42
      25.1475
      20.0275
      28.115
    
    
      1
      0
      2007
      5949.0
      2007-01-01 01:00:00
      -30.0
      4566.0
      30133.0
      2790.0
      922.6
      33.068100
      3.061760
      -151.4
      -14.096834
      29210.4
      -143.6
      1
      6.42
      25.1475
      20.0275
      28.115
    
    
      2
      0
      2007
      5949.0
      2007-01-01 02:00:00
      101.0
      4667.0
      29941.0
      2790.0
      849.2
      30.437276
      2.836245
      -73.4
      -7.955777
      29091.8
      -118.6
      1
      6.42
      25.1475
      20.0275
      28.115
    
    
      3
      0
      2007
      5949.0
      2007-01-01 03:00:00
      1.0
      4668.0
      29949.0
      2790.0
      1056.3
      37.860215
      3.526996
      207.1
      24.387659
      28892.7
      -199.1
      1
      6.42
      25.1475
      20.0275
      28.115
    
    
      4
      0
      2007
      5949.0
      2007-01-01 04:00:00
      17.0
      4685.0
      30248.0
      2790.0
      837.1
      30.003584
      2.767456
      -219.2
      -20.751680
      29410.9
      518.2
      1
      6.42
      25.1475
      20.0275
      28.115



In [53]:

    
X.tail()









    Out[53]:






  
    
      
      cluster_id_6
      Year
      nameplate_capacity
      DATETIME
      Gen Change (MW)
      GROSS LOAD (MW)
      ERCOT Load, MW
      Total Wind Installed, MW
      Total Wind Output, MW
      Wind Output, % of Installed
      Wind Output, % of Load
      1-hr MW change
      1-hr % change
      Net Load (MW)
      Net Load Change (MW)
      Month
      NG Price ($/mcf)
      All coal
      Lignite
      Subbituminous
    
  
  
    
      473329
      5
      2015
      11476.0
      2015-12-31 19:00:00
      -20.0
      7516.0
      39908.77734
      16170.0
      3824.932373
      23.654498
      9.584188
      483.971191
      14.485987
      36083.844967
      -1172.357911
      12
      2.23
      28.53
      24.04
      32.55
    
    
      473330
      5
      2015
      11476.0
      2015-12-31 20:00:00
      -964.0
      6552.0
      38736.85938
      16170.0
      4625.632813
      28.606264
      11.941166
      800.700440
      20.933715
      34111.226567
      -1972.618400
      12
      2.23
      28.53
      24.04
      32.55
    
    
      473331
      5
      2015
      11476.0
      2015-12-31 21:00:00
      -608.0
      5944.0
      37587.70313
      16170.0
      4957.714844
      30.659956
      13.189725
      332.082031
      7.179170
      32629.988286
      -1481.238281
      12
      2.23
      28.53
      24.04
      32.55
    
    
      473332
      5
      2015
      11476.0
      2015-12-31 22:00:00
      -246.0
      5698.0
      36356.26172
      16170.0
      4699.097656
      29.060592
      12.925140
      -258.617188
      -5.216460
      31657.164064
      -972.824222
      12
      2.23
      28.53
      24.04
      32.55
    
    
      473333
      5
      2015
      11476.0
      2015-12-31 23:00:00
      -333.0
      5365.0
      35150.33984
      16170.0
      4313.125000
      26.673624
      12.270507
      -385.972656
      -8.213761
      30837.214840
      -819.949224
      12
      2.23
      28.53
      24.04
      32.55



In [54]:

    
y = X.loc[:,['DATETIME', 'cluster_id_6', 'Gen Change (MW)']]



In [55]:

    
y.head()









    Out[55]:






  
    
      
      DATETIME
      cluster_id_6
      Gen Change (MW)
    
  
  
    
      0
      2007-01-01 00:00:00
      0
      NaN
    
    
      1
      2007-01-01 01:00:00
      0
      -30.0
    
    
      2
      2007-01-01 02:00:00
      0
      101.0
    
    
      3
      2007-01-01 03:00:00
      0
      1.0
    
    
      4
      2007-01-01 04:00:00
      0
      17.0



In [56]:

    
y.describe()









    Out[56]:






  
    
      
      cluster_id_6
      Gen Change (MW)
    
  
  
    
      count
      473334.000000
      473328.000000
    
    
      mean
      2.500000
      -0.012402
    
    
      std
      1.707827
      240.003813
    
    
      min
      0.000000
      -2877.000000
    
    
      25%
      1.000000
      -57.000000
    
    
      50%
      2.500000
      0.000000
    
    
      75%
      4.000000
      58.000000
    
    
      max
      5.000000
      3243.000000



In [57]:

    
X.drop('Gen Change (MW)', axis=1, inplace=True)

Export final dataframes



In [58]:

    
X_fn = 'X.csv'
y_fn = 'y.csv'
X_path = os.path.join(path, X_fn)
y_path = os.path.join(path, y_fn)

X.to_csv(X_path)
y.to_csv(y_path)

Other data we still need in this DF!!!

Still need the total capacity of each cluster at that hour (determined by year).

Anything else?

X (for every cluster and every hour):

Total installed wind
Total wind generation
Change in net demand (demand change + wind change) from previous hour
Total capacity of the cluster
Current generation of the cluster
Natural gas price (monthly)
Coal price (quarterly, data is not in format I hoped for)
- We either need to use a price for "All coal" (both SUB and LIG), or figure out the % heat input for each cluster. I'm fine either leaving out coal price to start or just using "All coal".

If we have time and need to improve performance, maybe add change in cluster generation over earlier timesteps (e.g. from two hours ago to one hour ago).

y (for every cluster and every hour)

Change in cluster generation from previous hour

Export EPA with change in generation

Don't think we need this anymore



In [ ]:

    
filename = 'EPA with hourly gen change.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
allEPA.to_csv(fullpath)

In early years the generation change is almost independent of the demand change. This slowly shifts over time, so that the generation is correlated with

Plot the change in generation for plant ids 298 & 3439



In [98]:

    
plants = allEPA.loc[allEPA['PLANT_ID'].isin([298, 3439]),:]



In [99]:

    
plants.tail()









    Out[99]:






  
    
      
      CO2 (SHORT TONS)
      COUNTY
      DATE
      DATETIME
      EPA REGION
      FACILITY LATITUDE
      FACILITY LONGITUDE
      FACILITY NAME
      GROSS LOAD (MW)
      HEAT INPUT (MMBTU)
      HOUR
      NERC REGION
      NOX (POUNDS)
      PLANT_ID
      SO2 (POUNDS)
      STATE
      STEAM LOAD (1000LB/HR)
      YEAR
      Gen Change (MW)
    
  
  
    
      1617163
      1504.3
      Limestone
      2015-12-31
      2015-12-31 19:00:00
      6
      31.4219
      -96.2525
      Limestone
      1552.0
      13816.9
      19
      ERCOT
      2418.4
      298
      5165.4
      TX
      0.0
      2015
      -39.0
    
    
      1617164
      1080.6
      Limestone
      2015-12-31
      2015-12-31 20:00:00
      6
      31.4219
      -96.2525
      Limestone
      1103.0
      9925.9
      20
      ERCOT
      1746.0
      298
      3614.5
      TX
      0.0
      2015
      -449.0
    
    
      1617165
      904.5
      Limestone
      2015-12-31
      2015-12-31 21:00:00
      6
      31.4219
      -96.2525
      Limestone
      903.0
      8308.0
      21
      ERCOT
      1690.4
      298
      2987.5
      TX
      0.0
      2015
      -200.0
    
    
      1617166
      913.6
      Limestone
      2015-12-31
      2015-12-31 22:00:00
      6
      31.4219
      -96.2525
      Limestone
      902.0
      8391.6
      22
      ERCOT
      1644.1
      298
      3046.2
      TX
      0.0
      2015
      -1.0
    
    
      1617167
      916.4
      Limestone
      2015-12-31
      2015-12-31 23:00:00
      6
      31.4219
      -96.2525
      Limestone
      901.0
      8417.2
      23
      ERCOT
      1575.6
      298
      3145.4
      TX
      0.0
      2015
      -1.0



In [32]:

    
plants.index = plants['DATETIME']



In [38]:

    
test = pd.merge(ercot, plants, left_index=True, right_index=True)